Skip to content

Conversation

@benfulton
Copy link

In Zoltan, you can set either NUM_LOCAL_PARTS or NUM_GLOBAL_PARTS but you should not set both. Setting both NUM_GLOBAL_PARTS and NUM_LOCAL_PARTS together has issues.

Setting both NUM_GLOBAL_PARTS and NUM_LOCAL_PARTS together has issues
@Patol75
Copy link
Contributor

Patol75 commented Feb 11, 2026

Just for context, I found the following in the Zoltan source code. It does not seem to me that setting both NUM_GLOBAL_PARTS and NUM_LOCAL_PARTS is prohibited, but there appears to be a compatibility pattern to respect between the two. Are we actually breaking it?

@benfulton
Copy link
Author

I discussed this with Stephan but could have written it up better here. We attempted to run flredecomp on 4 nodes, 64 cores per node and thus passed -o 256 as a parameter. The result was the error

Sum of NUM_LOCAL_PARTS 256 < NUM_GLOBAL_PARTS 512

and a failure in Zoltan_LB_Partition (which I discovered after I replaced the assert(ierr) with an FLAbort). I assume the 512 is due to flredcomp setting 256 global parts and also one local part per rank, although I couldn't verify that in the Zoltan code. I opened a Trilinos discussion and was told that it is not recommended to set both. This PR appears to fix the issue although we are still trying to get Fluidity running.

Flake8 <6.1.0 and python 3.12 seems to have issues throwing up errors in
the middle of f-strings.

Fixed in 6.1.0: PyCQA/pycodestyle#1148

Upgrade to 6.1.0 only, as any newer versions introduce more errors that
either need fixing, or ignoring explicitly - which should probably be
done in a separate PR.
Copy link
Contributor

@stephankramer stephankramer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link to that discussion!
Yeah, I'm also still confused as to why we are seeing that 512 pop up, but given that they say it's not recommended, and it seems to "help" on our cluster build - assuming the tests pass with it, I'd say we go with this

Also good to actually check for the zoltan return code and immediately stop with an error - previously errors would be picked up in the assert only when fluidity is built with debugging

Copy link
Contributor

@Patol75 Patol75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you for providing context. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants